From BM25 to Mixture-of-Encoders: Evaluations for Next-Gen Search and Retrieval Systems

Filip Makraduli • Location: TUECHTIG • Back to Haystack EU 2024

Modern user queries require a mix of structured and unstructured data in order to achieve satisfactory retrieval performance. This is where traditional search methods fall short. In this talk, we dive into retrieval evaluation, comparing keyword, vector, hybrid, and late-interaction models with Superlinked’s mixture-of-encoders approach. We examine how each approach fares in real-world scenarios (e.g. a query for “5 guests under $200 with 4.8+ rating”). Using benchmark datasets and real production use cases, we share metrics, evaluation methodology, and common pitfalls. We introduce Superlinked’s mixture-of-encoders approach, where dedicated encoders for various data types like text, numbers, and categories combined with LLM-driven query understanding enable more accurate and scalable retrieval. Finally, we discuss how to productionize this system and share use cases from travel to e-commerce, pointing toward the future of multi-attribute and meta data aware embeddings search.

Filip Makraduli

Superlinked